AIbase
Home
AI Tools
AI Models
MCP
AI NEWS
EN
Model Selection
Tags
Visual Question Answering (VQA)

# Visual Question Answering (VQA)

FLODA Deepfake
Apache-2.0
FLODA is an advanced deepfake detection model that integrates image caption generation and authenticity assessment functions, achieving high-precision detection through visual question answering tasks.
Text-to-Image English
F
byh711
113
0
Blip2 Flan T5 Xl Coco
MIT
BLIP-2 is a vision-language model that achieves language-image pretraining by freezing the image encoder and large language model, supporting tasks such as image caption generation and visual question answering.
Image-to-Text Transformers English
B
Salesforce
2,379
14
Git Base
MIT
GIT is a dual-conditional Transformer decoder based on CLIP image tokens and text tokens, designed for image-to-text generation tasks.
Image-to-Text Transformers Supports Multiple Languages
G
microsoft
365.74k
93
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
English简体中文繁體中文にほんご
© 2025AIbase